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Abstract — A classic experiment by Milgram shows that 
individuals can route messages along short paths in social 
networks, given only simple categorical information about 
recipients (such as "he is a prominent lawyer in Boston" or 
"she is a Freshman sociology major at Harvard"). That is, 
these networks have very short paths between pairs of nodes 
(the so-called small-world phenomenon); moreover, participants 
are able to route messages along these paths even though each 
person is only aware of a small part of the network topology. 
Some sociologists conjecture that participants in such scenarios 
use a greedy routing strategy in which they forward messages 
to acquaintances that have more categories in common with the 
recipient than they do, and similar strategies have recently been 
proposed for routing messages in dynamic ad-hoc networks of 
mobile devices. In this paper, we introduce a network property 
called membership dimension, which characterizes the cognitive 
load required to maintain relationships between participants 
and categories in a social network. We show that any connected 
network has a system of categories that will support greedy 
routing, but that these categories can be made to have small 
membership dimension if and only if the underlying network 
exhibits the small-world phenomenon. 

I. Introduction 

In a pioneering experiment in the 1960's, Stanley Milgram 
and colleagues fl4) , (20) , p4) studied message routing in 
real-world social networks. 296 randomly chosen people 
in Nebraska and Kansas were asked to route a letter to a 
lawyer in Boston by forwarding it to an acquaintance, who 
would receive the same instructions. Messages that reached 
their destinations typically passed between at most six 
acquaintances^] The observation that acquaintance graphs 
have such short paths has come to be called the small-world 
phenomenon | |11) , |25|. 

Even more surprising than the existence of these short 
paths is that participants are able to efficiently route mes- 
sages using only local information and simple facts about 
targets, such as ethnicity, occupation, name, and location. 

As a way to model the methods used by humans to route 
such messages, sociologists have studied the importance of 
categories, that is, various groups to which people belong, 

'This observation has also led to the concept of "six degrees of 
separation" between all people on earth and the trivia game, "Six Degrees 
of Kevin Bacon," where players take turns trying to link performers to the 
actor Kevin Bacon via at most six movie collaborations. 



in the small-world phenomenon. In the early 1970's, Hunter 
and Shotland [8] found that messages routed between people 
in the same university category (such as student, faculty, etc.) 
had shorter paths than messages routed across categories. 
Killworth and Bernard (T0| performed experiments in the 
late 1970's that they called reverse small-world experiments 
in which each participant was presented with a list of mes- 
sages for hundreds of targets, identified by the categories of 
town, occupation, ethnic background, and gender, and asked 
to whom they would send each of these messages. The study 
concluded that the choices people make in selecting routes 
are overwhelmingly categorical in nature. In the late 1980's, 
Bernard et al. (3) extended this work to identify which of 
twenty categories are most important for message routing to 
people from various cultures. More recently, Watts et al. J26[ 
present a hierarchical model for categorical organization 
in social networks for the sake of message routing. They 
propose groups as the leaves of rooted trees, with internal 
nodes defining groups-of-groups, and so on. They define 
an ultrametric on sets of such overlapping hierarchies and 
conjecture that people use the minimum distance in one of 
their trees to make message routing decisions. That is, they 
argue that individuals can understand their "social distance" 
to a target as the minimum distance between them and 
the target in one of their categories. Such a determination 
requires some global knowledge about the structures of the 
various group hierarchies. 

Although this previous work shows the importance of 
categories and of hierarchies of categories in explaining the 
small world phenomenon, it does not explain where the 
categories come from or what properties they need to have 
in order to allow greedy routing to work. Hence, this prior 
work leaves open the following questions: 

• Which social networks support systems of categories 
that allow participants to route messages using the sim- 
ple greedy rule of sending a message to an acquaintance 
who has more categories in common with the target? 

• How complicated a system of categories is needed for 
this purpose, and what properties of the underlying 
network can be used to characterize the complexity of 
the category system? 




(a) (b) 

Figure 1. A set of elements U (drawn arbitrarily as points in the 
plane), (a) The graph G on U. (b) The categories 5 on U. In this 
example, the membership dimension is 4, because no element is 
contained in more than 4 groups. 



of a social network, its path length and its membership 
dimension. In particular: 

• We show that the membership dimension of (G,S) 
must be at least the diameter of G, diam(G), for a 
local, greedy, category-based routing strategy to work. 

• We show that every connected graph G — (U,E), has 
a collection S of categories such that local, greedy, 
category-based routing always works, with membership 
dimension 0((diam(G) + log |C/|) 2 ). 

Since Milgram's work [14], p0| , J24) , social scientists 
have believed that real-world social networks have diameters 
bounded by constants or slowly growing functions of the 
network size. Under a weak form of this assumption, that the 
diameter is 0(log|f7|), our results provide a natural model 
for how participants in a social network could efficiently 
route messages using a local, greedy, category-based routing 
strategy while remembering an amount of information that 
is only polylogarithmic in the size of the network. 



B. Previous Related Work 



Our goal in this paper, therefore, is to address these 
questions by studying the existence of mathematical and 
algorithmic frameworks that demonstrate the feasibility of 
local, greedy, category-based routing in social networks. 

A. Our Results 

Inspired by the work of Watts et al. p6| , we view a social 
network as an undirected graph G = (U, E), whose vertices 
represent people and whose edges represent relationships, 
taken together with a collection, S C 2 , of categories 
defined on the vertices in G. Figure [T] shows an example. In 
addition, given a network G = (U,E) and category system 
S, we define the membership dimension of S to be 

max|{G G S: u G C}\, 

that is, the maximum number of groups to which any one 
person in the network belongs. The membership dimension 
characterizes the cognitive load of performing routing tasks 
in the given system of categories — if the membership di- 
mension is small, each actor in the network only needs to 
know a proportionately small amount of information about 
his or her own categories, his or her neighbors' categories, 
and the categories of each message's eventual destination. 
Thus, we would expect real-world social networks to have 
small membership dimension. 

In this paper, we provide a constructive proof that a 
category system can support greedy routing. Our results 
are not intended to model the actual formation of social 
categories, and we take no position on whether categories 
are formed from the network, the network is formed from 
categories, or both form together. Rather, our intention is 
to show the close relation between two natural parameters 



Geometric greedy routing (6), |15| uses geographic lo- 
cation rather than categorical data to route messages. In 
this method, vertices have coordinates in a geometric metric 
space and messages are routed to any neighbor that is closer 
to the target's coordinates. Greedy routing may not succeed 
in certain geometric networks, so a number of techniques 
have been developed to assist such greedy routing schemes 
when they fail (4), @' GU- Introduced by Rao et al. (23), 
virtual coordinates can overcome the shortcomings of real- 
world coordinates and allow simple greedy forwarding to 
function without the assistance of fallback algorithms. This 
approach has been explored by other researchers (T), p2) , 
p7) , p2) , who study various network properties that allow 
for greedy routing to succeed. Several researchers also study 
the existence of succinct greedy -routing strategies |5), (7), 
fTH) , 1 21 1, where the number of bits needed to represent 
the coordinates of each vertex is polylogarithmic in the size 
of the network; this notion of succinctness for geometric 
greedy routing is closely analogous to our definition of the 
membership dimension for categorical greedy routing. 

Recent work by Mei et al. [19], studies category-based 
greedy routing as a heuristic for performing routing in 
dynamic delay-tolerant networks. Mei et al. assume that 
the network nodes have been organized into pre-defined 
categories based on the users' interests. Experiments suggest 
that using these categories for greedy routing is superior 
to routing heuristics based on location or simple random 
choices. One can interpret the categorical greedy routing 
techniques of Mei et al. and of this paper as being geometric 
routing schemes using virtual coordinates, where the coordi- 
nates represent category memberships. In this interpretation, 
the membership dimension of an embedding corresponds to 
the number of nonzero coordinates of each node, and our 
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Figure 2. Illustration of the routing rule. 



results show that such greedy routing schemes can be done 
succinctly in graphs with small diameter. 

Similarly to the work of this paper, Kleinberg fTT) studies 
the small-world phenomenon from an algorithmic perspec- 
tive. However, his approach is orthogonal to ours: He focuses 
on location rather than categorical information as the critical 
factor for the ability to find short routes efficiently, and 
constructs a random network based on that information, 
whereas our approach takes the network as a given and 
studies the kinds of categorical structures needed to support 
category-based greedy routing. 

In addition, it is worth noting that small world networks 
exhibit scale-free properties. 

II. Routing based on Categorical Information 

In this section, we introduce a mathematical model of 
categorical greedy routing, and provide basic definitions and 
properties that guarantee the success of this strategy. 

A. Basic definitions 

Abstracting away the social context, let U be the universe 
of n people defining the potential sources, targets, and 
intermediates for message routes, and let G = (U,E) be an 
undirected graph whose m edges represent pairs of people 
who can communicate. For any two elements s,t G U, 
let sp(s,t) be the length of the shortest path in G from 
s to t. The diameter diam(G) = max ste ;y sp(s, t) is 
the maximum length of any shortest path. For s G U, 
define the neighborhood of s to be the set of neighbors 
N(s) = {u G U | {s, u} G E} of s in G. 

Now let S C 2 U be a set of subsets of U, which represent 
the abstract categories that elements of U belong to. For a 
given u G U, we define cat(u) G S to be the set of groups 
to which u belongs: cat (it) = {C G S | u G C}. 

Definition 1 (membership dimension): The membership 
dimension of S is the maximum number of elements of S 
that any element of U is contained in, that is, 

merndim(iS) = max | cat(u)|. 

As discussed, there is evidence that in real world social 
networks and group structures (G,S), both diam(G) and 
memdim(6>) are significantly smaller than \U\. 

B. The routing strategy 

We now describe a simple greedy category-based strategy 
to route a message from one node to another. We clarify the 
distance function immediately following the rule definition. 
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Figure 3. Two networks with the same elements and categories, 
(a) An example that is internally connected, but not shattered: no 
category contains y and a neighbor of v but not v itself, (b) An 
example that is shattered, but not internally connected: the induced 
graph of category {u,w, x, z} is not connected. 



Definition 2 (greedy routing rule): If a node u receives a 
message M intended for a destination w ^ u, then u should 
forward M to a neighbor v G N(u) that is closer to w than 
u is, that is, for which d(v, w) < d(u, w). 

The category-based distance function used by this rule is 
d(s,t) — | cat(t) \ cat(s)|, which measures the number of 
categories of the target that the current node does not share|^] 
This number decreases as the number of shared groups of S 
between the current node and the target increases. We refer 
to the greedy routing strategy that uses this distance function 
as ROUTING (see Figure^!. 

For category systems with low membership dimension, 
this strategy is easy to evaluate using only local knowledge 
about the categories of each neighbor of the current node 
and the categories of the target node. 

C. Successful routing 

We now investigate conditions under which ROUTING 
can successfully route messages between all pairs of nodes 
in a network. We identify several properties of a graph G 
and associated group structure S that directly influence the 
feasibility of routing. For routing to succeed, G must be 
connected. It seems natural to consider a stronger property: 

Definition 3 (internally connected): (G 7 S) is internally 
connected if for each C G S, G restricted to C is connected. 

Figure |3(a)| shows an example of an internally connected 
pair (G,S). This is a very natural property for sociological 
groups to exhibit. People belonging to the same group will 
have greater cohesiveness, and if a group is not internally 
connected then it may be redefined to be the set of groups 
defined by its connected components. 

Definition 4 (shattered): A pair (G, S) is shattered if, for 
all s,t € U, s ^ t, there is a neighbor u G N(s) and a set 
C G S such that C contains u and t, but not s. 



Figure 3(b) shows an example of a shattered pair. Note 
that in this definition, u and t could be the same node. This 
property falls out naturally from the instructions given in the 

2 Note that d is not a metric, since it is not necessarily symmetric. 
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Figure 5. The sets B v for each vertex v in the path. The sets A v 
are constructed symmetrically. 
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Figure 4. ROUTING does not work in this graph, even though it 
is internally connected and shattered. Routing from v to x fails: v, 
U, and w are all at distance 2 from x, so v has no neighbor that 
is closer than it to x. 

real-world routing experiments of Milgram and others. In 
order for someone to advance a letter toward a target, there 
must be an acquaintance that shares additional interests with 
the target. Indeed, we now show that the shattered property 
is necessary for ROUTING to work. 

Lemma 1: If (G,S) is not shattered, ROUTING fails. 
Proof: Since (G, S) is not shattered, there exists a pair 
of vertices s and t, where s's neighbors are not in sets with t 
that do not contain s. Therefore, s's neighbors cannot share 
strictly more sets with f as s does, and ROUTING will fail 
to route from s to t. ■ 

If G is a tree, then these two properties together are 
sufficient for the routing strategy to always work: 

Lemma 2: If G is a tree, and (G, S) is internally con- 
nected and shattered, then ROUTING is guaranteed to work. 

Proof: Let s and t be vertices in G. Since G is a tree, 
there is one simple path from s to t. Let (u, v) be an edge 
on the path from s to t. First, we claim that every set in 
S that contains both u and t also contains v. This follows 
from (G, S) being internally connected: any set C E S with 
u,t E C must also contain v, since v is on the only path 
between u and t. Therefore, v is contained in at least as many 
sets in S with t as u is. However, by the shattered property, 
v is in a set in S with t that does not contain u. Therefore v 
is in strictly more sets with t than u is. This property holds 
for every simple path; hence, ROUTING always works. ■ 

Although sufficient for routing in trees, the internally 
connected and shattered properties are not sufficient for 
ROUTING to work on arbitrary connected graphs. Figure |4] 
shows a counter-example — ROUTING is unable to route a 
message from the leftmost to the rightmost node, since there 
is no neighbor whose distance to the target is smaller. 

III. Existence of Categories 

In this section, we consider the following question: Is it 
possible to construct the family S so that ROUTING always 
works and S has low membership dimension? 

We show that such a construction is always possible if 
we are given a connected graph as input. We also show that 
it is impossible to construct an S such that ROUTING will 
work if the graph is not known in advance. 



A. Constructing S given G 

Given a connected graph G = (U,E) as input, we would 
like to construct a family S C 2 U so that ROUTING works, 
and the membership dimension of S is small. We concentrate 
foremost on constructions of category collections that are 
internally connected and shattered, because of the social 
significance of these properties. Nevertheless, even without 
these properties, we have the following lower bound. 

Lemma 3: Let G and S be a graph and a category system, 
respectively, such that ROUTING works for G and S. Then 
memdim(5) > diam(G). 

Proof: By definition of the diameter, there are two 
vertices s,t E U such that sp(s,t) = diam(G). Let P be 
the path that ROUTING follows from s to t, and note that 
the length of P must be at least diam(G). An edge (u, v) 
can only be on P if d(v,t) < d(u,t). Since d(v) can 
only take integer values, d(u,t) > d(v,t) + 1. Therefore, 
d(s,t) > \P\. By definition, d(s,t) = | cat(t) \ cat(s)|, and 
memdim(6>) is the maximum of cat(-) over all elements; 
hence mcmdirn(S) > |cat(t)| > | cat(f) \ cat(s)| = 
d(s,t) > \P\ > diam(G), as claimed. ■ 

For paths, this bound is tight: 

Lemma 4: If G is a path, then there exists an S s.t. (G, S) 
is shattered and internally connected with mcmdirn(iS) = 
diam(G). 

Proof: Arbitrarily pick one of the two end vertices of 
G and let us refer to the vertices in G by their distance, to 
n — 1, from this vertex. For each vertex i, form two sets Ai 
and Bi, where A\ = {0, . . . ,i— 1} and Bi = {i + 1, . . . , n — 
1}, and let S = {J veU {A v , B v }. Figure [5] illustrates this 
construction. Each set in S consists of a path of vertices 
and therefore S is internally connected. S is also shattered, 
since for all s and t, s has a neighbor that shares either 
A s or B s with t, but s is not in these sets. Considering 
memdim(6>), note that each vertex i is contained in sets Aj 
for < j < i and B^ for k < i < n — 1. Therefore, each 
vertex is in exactly n — 1 sets, which is diam(G). ■ 

It follows from Lemmas [2] and [4] that, if G is a path, 
one can construct S with nicmdim(S) = diam(G), so that 
ROUTING works in G. 

There are other graphs for which it is relatively easy to set 
up a category set that is shattered and internally connected in 
a way that supports the ROUTING algorithm. For example, 
in a tree of height 1 (i.e., a star graph), with root r, we could 
create for each leaf of the tree two categories, one containing 
the leaf itself and one containing both the leaf and the root. 
Every path in this tree supports ROUTING. However, the 
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Figure 6. The collection of sets L v for an example subtree at v. 

membership dimension of this category system is high, since 
the root belongs to a linear number of categories. So even in 
this simple example, supporting ROUTING and achieving 
low membership dimension is a challenge. Moreover, this 
challenge becomes even more difficult already for a tree of 
height 2, since navigating from any leaf, x, to another leaf, y, 
requires that the parent of x belong to more categories with y 
than x — and this must be true for every other leaf, y. Thus, 
it is perhaps somewhat surprising that we can construct a 
set of categories, S, for an arbitrary binary tree that causes 
this network to be shattered and internally connected (so the 
ROUTING strategy works, by Lemma [2| and such that S 
has small membership dimension. 

Lemma 5: If G is a binary tree, then there exists an 
S s.t. (G, S) is shattered and internally connected with 
memdim(iS) = 0(diam 2 (G)). 

Proof: We show how to construct S from G. Arbitrarily 
pick a vertex r £ U of degree at most 2 and root the binary 
tree at r, so each vertex v has left and right children, left(u) 
and right(u), and let height(u) be the length of the longest 
simple path from v to any descendant of v. For each vertex 
v, we create a set S v , containing v's descendants (which 
includes v). We further construct two families, L v and R v , 
using helper sets L„,j and Rvj. Let L Vt i (resp., R Vi ) consist 
of v, the vertices in v's left (right) subtree down to depth i, 
and all vertices in v's right (left) subtree. Then define 

L v = {L vi | depth(w) <i< depth(v) + height (left (w))}. 

Figure [6] illustrates this. The family R v is defined symmet- 
rically. Our S is then defined as 

S = |J {S v } U L v U R v , 

Each set in S is a connected subgraph of G, so S is internally 
connected. As the following argument shows, S is shattered: 
If s is an ancestor of t, then s's child u on the path to t is 
in set S u which contains u and t but not s. Otherwise, let 
v be the lowest common ancestor of s and t, and assume 
without loss of generality that s in v's left subtree; then 
£u,depth(s)-i contains s's parent and t but not s. 

We now analyze the membership dimension of this con- 
struction. Let v be a vertex, and let ancestors(w) be the set of 
v's ancestors. For u £ ancestors(u), v £ S u , and v belongs 
to O (height (u)) sets of L u and R u . Then v belongs to 



(E^anccstors^) height(u)) sets, which is 0(diam 2 (G)) 
for any v. ■ 

We now extend this result to arbitrary trees by applying 
weight-balanced binary trees |2), fT3) . 

Definition 5 (weight balanced binary tree): A weight 
balanced binary tree is a binary tree that stores weighted 
items in its leaves. If item i has weight Wi, and all items 
have a combined weight of W then item i is stored at depth 
0(log{W/ Wi )). 

Lemma 6: Let T be an n-node rooted tree with height h. 
We can embed T into a binary tree such that the ancestor- 
descendant relationship is preserved, and the resulting tree 
has height 0(h + logn). 

Proof: Let n u be the number of descendants of vertex u 
in T. For each vertex u in T that has more than two children, 
we expand the subtree consisting of u and us children into 
a binary tree as follows. Construct a weight balanced binary 
tree B on the children of u, where the weight of a child 
v is n v . We let u be the root of B. Each child v of u in 
the original tree is then a leaf at depth \og(n u /n v ) in B. 
Performing this construction for each vertex u in the tree 
expands T into a binary tree with the ancestor-descendant 
relationship preserved from T. 

Furthermore, each path from root to leaf in T is only 
expanded by log(n) nodes, which we can see as follows. 
Each parent-to-child edge (u, v) in T is replaced by a path 
of length 0(\og(n u /n v )). Therefore for each path P from 
root r to leaf I in T, our construction expands P by length 
0(^(„ v \ eP \og(n u /n v )), which is a sum telescoping to 
0(log(n r /n;)) = O(logrt). Therefore, the height of the new 
binary tree is 0(h + logn). ■ 

Combining this lemma with Lemma [2] we get the follow- 
ing theorem. 

Theorem 1: Given a tree T, it is possible to construct a 
family S of subsets such that ROUTING works for T and 
memdim(S) = 0((diam(T) +logn) 2 ). 

Proof: Arbitrarily root T and embed T in a binary 
tree B using the method in Lemma [6] Then B has 
height 0(diam(T) + logn), and diameter diam(_B) = 
0(diam(T) + logn). Applying the construction from 
Lemma [5] to B gives us a family Sb with memdim( l Ss) = 
0((diam(T) + logn) 2 ). We then construct a family St, 
by removing vertices that are in B but not T from the 
sets in Sb- By construction, (T,St) is shattered and in- 
ternally connected, and memdim(6>T) < memdim(5s) = 
0((diam(T) + logn) 2 ). By Lemma|2] ROUTING works on 
T with category sets from St- ■ 

We can further extend this theorem to arbitrary connected 
graphs, which is the main upper bound result of this paper. 

Theorem 2: If G is connected, there exists S s.t. ROUT- 
ING works and mcmdim(6>) = 0((diam(G) + log(n)) 2 ). 

Proof: Compute a low-diameter spanning tree T of 
G. This step can easily be done using breadth-first search, 
producing a tree with diameter at most 2 diam(G). We then 
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use the construction from Theorem [T] on T. For greedy 
routing to work in a graph G, note that it is sufficient to 
show that it works in a spanning tree of G. Therefore, since 
ROUTING works in T, ROUTING also works in G. ■ 

IV. Conclusion and Open Problems 

We have presented a construction of groups S on a 
connected graph G that allows a simple greedy routing algo- 
rithm, utilizing a notion of distance on group membership, to 
guarantee delivery between nodes in G. Such a construction 
will have membership dimension 0((diam(G) + logn) 2 ), 
demonstrating a small cognitive load for the members of G. 

There are several directions for future work. For example, 
while we have shown that the membership dimension must 
be minimally the diameter of G, it remains to be shown 
if the membership dimension must be the square of the 
diameter plus a logarithmic factor for arbitrary graphs. We 
conjecture that the square term is not strictly needed in the 
membership dimension in order for ROUTING to work. 
Our group construction is performed for a general graph 
by selecting a low diameter spanning tree and using the 
presented tree construction, so it may be possible that there is 
a group construction that has lower membership dimension 
and more efficient routing if it is constructed directly in G. 

In this paper all categories are given equal weight with 
respect to routing tasks and that participants use a simple 
greedy routing algorithm based solely on increasing the 
number of categories in common with the target. Future 
work could include study of a category-based routing strat- 
egy that allows participants to weight various categories 
higher than others, as in the work of Bernard et al. J5J. 
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